尽管在许多控制任务中进行了大量的应用和深入的强化学习的成功,但它仍然存在许多关键问题和局限性,包括具有稀疏奖励的时间信用分配,缺乏有效的探索以及对对超级参数的脆弱融合,这对超级参与者非常敏感问题。持续控制中深厚的强化学习的问题以及进化算法在面对其中一些问题方面的成功,已经出现了进化增强学习的想法,这引起了许多争议。尽管在该领域的一些研究中取得了成功的结果,但针对这些问题及其局限性的适当解决方案尚待提出。本研究旨在研究进一步加强强化学习和进化计算的两个领域的效率,并朝着改善方法和现有挑战迈出一步。 “使用精英缓冲液的进化深度强化学习”算法通过互动学习能力和人脑中的假设结果的灵感引入了一种新的机制。在这种方法中,精英缓冲液的利用(这是受到人类思想的经验概括的启发),以及跨界和突变操作员的存在,以及连续一代的交互式学习,具有提高的效率,收敛性和收敛性,收敛性和在连续控制领域的正确进步。根据实验的结果,所提出的方法超过了具有高复杂性和维度的环境中的其他知名方法,并且在解决上述问题和局限性方面表现出色。
translated by 谷歌翻译
Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream task has been a well-known paradigm in Natural Language Processing. However, with the ever-growing size of PLMs, training the entire model on several downstream tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve the efficiency of fine-tuning PLMs. One popular category of PET methods is the low-rank adaptation methods which insert learnable truncated SVD modules into the original model either sequentially or in parallel. However, low-rank decomposition suffers from limited representation power. In this work, we address this problem using the Kronecker product instead of the low-rank representation. We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
translated by 谷歌翻译
The COVID-19 pandemic has caused drastic alternations in human life in all aspects. The government's laws in this regard affected the lifestyle of all people. Due to this fact studying the sentiment of individuals is essential to be aware of the future impacts of the coming pandemics. To contribute to this aim, we proposed an NLP (Natural Language Processing) model to analyze open-text answers in a survey in Persian and detect positive and negative feelings of the people in Iran. In this study, a distilBert transformer model was applied to take on this task. We deployed three approaches to perform the comparison, and our best model could gain accuracy: 0.824, Precision: 0.824, Recall: 0.798, and F1 score: 0.804.
translated by 谷歌翻译
Bi-encoders and cross-encoders are widely used in many state-of-the-art retrieval pipelines. In this work we study the generalization ability of these two types of architectures on a wide range of parameter count on both in-domain and out-of-domain scenarios. We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models. Our experiments show that increasing model size results in marginal gains on in-domain test sets, but much larger gains in new domains never seen during fine-tuning. Furthermore, we show that cross-encoders largely outperform bi-encoders of similar size in several tasks. In the BEIR benchmark, our largest cross-encoder surpasses a state-of-the-art bi-encoder by more than 4 average points. Finally, we show that using bi-encoders as first-stage retrievers provides no gains in comparison to a simpler retriever such as BM25 on out-of-domain tasks. The code is available at https://github.com/guilhermemr04/scaling-zero-shot-retrieval.git
translated by 谷歌翻译
Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE) aims at predicting the relation between a pair of sentences (premise and hypothesis) as entailment, contradiction or semantic independence. Although deep learning models have shown promising performance for NLI in recent years, they rely on large scale expensive human-annotated datasets. Semi-supervised learning (SSL) is a popular technique for reducing the reliance on human annotation by leveraging unlabeled data for training. However, despite its substantial success on single sentence classification tasks where the challenge in making use of unlabeled data is to assign "good enough" pseudo-labels, for NLI tasks, the nature of unlabeled data is more complex: one of the sentences in the pair (usually the hypothesis) along with the class label are missing from the data and require human annotations, which makes SSL for NLI more challenging. In this paper, we propose a novel way to incorporate unlabeled data in SSL for NLI where we use a conditional language model, BART to generate the hypotheses for the unlabeled sentences (used as premises). Our experiments show that our SSL framework successfully exploits unlabeled data and substantially improves the performance of four NLI datasets in low-resource settings. We release our code at: https://github.com/msadat3/SSL_for_NLI.
translated by 谷歌翻译
Automatic topic classification has been studied extensively to assist managing and indexing scientific documents in a digital collection. With the large number of topics being available in recent years, it has become necessary to arrange them in a hierarchy. Therefore, the automatic classification systems need to be able to classify the documents hierarchically. In addition, each paper is often assigned to more than one relevant topic. For example, a paper can be assigned to several topics in a hierarchy tree. In this paper, we introduce a new dataset for hierarchical multi-label text classification (HMLTC) of scientific papers called SciHTC, which contains 186,160 papers and 1,233 categories from the ACM CCS tree. We establish strong baselines for HMLTC and propose a multi-task learning approach for topic classification with keyword labeling as an auxiliary task. Our best model achieves a Macro-F1 score of 34.57% which shows that this dataset provides significant research opportunities on hierarchical scientific topic classification. We make our dataset and code available on Github.
translated by 谷歌翻译
Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer module, in which we use pre-trained models from the existing literature, and therefore, our metric can be used without further training. We show that RQUGE has a higher correlation with human judgment without relying on the reference question. RQUGE is shown to be significantly more robust to several adversarial corruptions. Additionally, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on the synthetic data generated by a question generation model and re-ranked by RQUGE.
translated by 谷歌翻译
基于变压器的模型用于实现各种深度学习任务的最新性能。由于基于变压器的模型具有大量参数,因此在下游任务上进行微调是计算密集型和饥饿的能量。此类型号的自动混合精液FP32/FP16微调以前已用于降低计算资源需求。但是,随着低位整数背面传播的最新进展,有可能进一步减少计算和记忆脚印。在这项工作中,我们探索了一种新颖的整数训练方法,该方法使用整数算术来进行正向传播和梯度计算,对基于变压器的模型中的线性,卷积,层和层和嵌入层的梯度计算。此外,我们研究了各种整数位宽度的效果,以找到基于变压器模型的整数微调所需的最小位宽度。我们使用整数层对流行的下游任务进行了微调和VIT模型。我们表明,16位整数模型与浮点基线性能匹配。将位宽度降低到10,我们观察到0.5平均得分下降。最后,将位宽度的进一步降低到8的平均得分下降为1.7分。
translated by 谷歌翻译
对文本生成的最新基于嵌入的评估指标的评估主要是基于衡量其与标准基准评估的相关性。但是,这些基准主要是从相似的域到用于浏览单词嵌入的域。这引起了人们对将基于嵌入的指标(缺乏)概括为新的和嘈杂的域的(缺乏)概括,这些指标包含与预处理数据不同的词汇。在本文中,我们研究了BertScore的鲁棒性,BertScore是文本生成最受欢迎的基于嵌入的指标之一。我们表明,(a)基于嵌入的度量与人类在标准基准上具有最高相关性的基于嵌入的度量,如果输入噪声或未知代币的量增加,则具有最低的相关性,(b)从预处理的第一层中嵌入的嵌入模型改善了所有指标的鲁棒性,并且(c)使用字符级嵌入式(而不是基于令牌的嵌入),从预科模型的第一层中实现了最高的鲁棒性。
translated by 谷歌翻译
激活功能可以对降低输入数据的拓扑复杂性产生重大影响,从而提高模型的性能。选择合适的激活函数是神经模型设计中的重要步骤。但是,在基于变压器的语言模型中很少讨论或探索激活功能的选择。事先选择它们的激活功能,然后从预训练中固定到微调。结果,在这个漫长的生命周期中,无法调整它们对模型的电感偏见。此外,随后开发的模型(例如Roberta,Bart和GPT-3)经常跟进先前的工作(例如BERT),以使用相同的激活函数而无需合理。在本文中,我们研究了变压器体系结构中使用理性激活函数(RAF)(RAF)的有效性。与常规,预定义的激活功能相反,RAF可以根据输入数据自适应地学习最佳激活功能。我们的实验表明,基于RAF的变压器(RAFT)比具有GELU函数的香草BERT的验证性更低。我们进一步评估了低和全数据设置中下游任务的筏。我们的结果表明,筏在大多数任务和设置上都优于对应模型。例如,在低数据表情况下(有100个训练示例),木筏在胶水基准上的表现平均高出5.71点,在全数据设置的小队中,平均得分为2.05分。对学到的RAF的形状的分析进一步揭示了它们在预训练模型的不同层之间有很大的变化,并且看起来与常规激活函数大多不同。 RAFT为根据学习的激活功能打开了一个新的研究方向,用于分析和解释预训练的模型。
translated by 谷歌翻译